Primary Question: whether daily concentrations of PM 2.5 decreased in California over the 20 years spanning from 2002 to 2022.
Questoin 1
(30 points) Given the formulated question from the assignment description, you will now conduct EDA Checklist items 1-5. First, download 2002 and 2022 data for all sites in California from the EPA Air Quality Data website, then read the data into R. For each of the two datasets, check the dimensions, headers, footers, variable names and variable types. Check the distribution of the key variable we are analyzing (PM 2.5). Write up a summary of all of your findings.
Download and read the data
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Date Source Site.ID POC Daily.Mean.PM2.5.Concentration Units
1 01/05/2002 AQS 60010007 1 25.1 ug/m3 LC
2 01/06/2002 AQS 60010007 1 31.6 ug/m3 LC
3 01/08/2002 AQS 60010007 1 21.4 ug/m3 LC
4 01/11/2002 AQS 60010007 1 25.9 ug/m3 LC
5 01/14/2002 AQS 60010007 1 34.5 ug/m3 LC
6 01/17/2002 AQS 60010007 1 41.0 ug/m3 LC
Daily.AQI.Value Local.Site.Name Daily.Obs.Count Percent.Complete
1 81 Livermore 1 100
2 93 Livermore 1 100
3 74 Livermore 1 100
4 82 Livermore 1 100
5 98 Livermore 1 100
6 115 Livermore 1 100
AQS.Parameter.Code AQS.Parameter.Description Method.Code
1 88101 PM2.5 - Local Conditions 120
2 88101 PM2.5 - Local Conditions 120
3 88101 PM2.5 - Local Conditions 120
4 88101 PM2.5 - Local Conditions 120
5 88101 PM2.5 - Local Conditions 120
6 88101 PM2.5 - Local Conditions 120
Method.Description CBSA.Code
1 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
2 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
3 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
4 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
5 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
6 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
CBSA.Name State.FIPS.Code State County.FIPS.Code
1 San Francisco-Oakland-Hayward, CA 6 California 1
2 San Francisco-Oakland-Hayward, CA 6 California 1
3 San Francisco-Oakland-Hayward, CA 6 California 1
4 San Francisco-Oakland-Hayward, CA 6 California 1
5 San Francisco-Oakland-Hayward, CA 6 California 1
6 San Francisco-Oakland-Hayward, CA 6 California 1
County Site.Latitude Site.Longitude
1 Alameda 37.68753 -121.7842
2 Alameda 37.68753 -121.7842
3 Alameda 37.68753 -121.7842
4 Alameda 37.68753 -121.7842
5 Alameda 37.68753 -121.7842
6 Alameda 37.68753 -121.7842
tail(df_2002)
Date Source Site.ID POC Daily.Mean.PM2.5.Concentration Units
15971 12/10/2002 AQS 61131003 1 15 ug/m3 LC
15972 12/13/2002 AQS 61131003 1 15 ug/m3 LC
15973 12/22/2002 AQS 61131003 1 1 ug/m3 LC
15974 12/25/2002 AQS 61131003 1 23 ug/m3 LC
15975 12/28/2002 AQS 61131003 1 5 ug/m3 LC
15976 12/31/2002 AQS 61131003 1 6 ug/m3 LC
Daily.AQI.Value Local.Site.Name Daily.Obs.Count Percent.Complete
15971 62 Woodland-Gibson Road 1 100
15972 62 Woodland-Gibson Road 1 100
15973 6 Woodland-Gibson Road 1 100
15974 77 Woodland-Gibson Road 1 100
15975 28 Woodland-Gibson Road 1 100
15976 33 Woodland-Gibson Road 1 100
AQS.Parameter.Code AQS.Parameter.Description Method.Code
15971 88101 PM2.5 - Local Conditions 117
15972 88101 PM2.5 - Local Conditions 117
15973 88101 PM2.5 - Local Conditions 117
15974 88101 PM2.5 - Local Conditions 117
15975 88101 PM2.5 - Local Conditions 117
15976 88101 PM2.5 - Local Conditions 117
Method.Description CBSA.Code
15971 R & P Model 2000 PM2.5 Sampler w/WINS 40900
15972 R & P Model 2000 PM2.5 Sampler w/WINS 40900
15973 R & P Model 2000 PM2.5 Sampler w/WINS 40900
15974 R & P Model 2000 PM2.5 Sampler w/WINS 40900
15975 R & P Model 2000 PM2.5 Sampler w/WINS 40900
15976 R & P Model 2000 PM2.5 Sampler w/WINS 40900
CBSA.Name State.FIPS.Code State
15971 Sacramento--Roseville--Arden-Arcade, CA 6 California
15972 Sacramento--Roseville--Arden-Arcade, CA 6 California
15973 Sacramento--Roseville--Arden-Arcade, CA 6 California
15974 Sacramento--Roseville--Arden-Arcade, CA 6 California
15975 Sacramento--Roseville--Arden-Arcade, CA 6 California
15976 Sacramento--Roseville--Arden-Arcade, CA 6 California
County.FIPS.Code County Site.Latitude Site.Longitude
15971 113 Yolo 38.66121 -121.7327
15972 113 Yolo 38.66121 -121.7327
15973 113 Yolo 38.66121 -121.7327
15974 113 Yolo 38.66121 -121.7327
15975 113 Yolo 38.66121 -121.7327
15976 113 Yolo 38.66121 -121.7327
summary(df_2002)
Date Source Site.ID POC
Length:15976 Length:15976 Min. :60010007 Min. :1.000
Class :character Class :character 1st Qu.:60290014 1st Qu.:1.000
Mode :character Mode :character Median :60590007 Median :1.000
Mean :60549600 Mean :1.581
3rd Qu.:60731002 3rd Qu.:1.000
Max. :61131003 Max. :6.000
Daily.Mean.PM2.5.Concentration Units Daily.AQI.Value
Min. : 0.00 Length:15976 Min. : 0.00
1st Qu.: 7.00 Class :character 1st Qu.: 39.00
Median : 12.00 Mode :character Median : 56.00
Mean : 16.12 Mean : 59.28
3rd Qu.: 20.50 3rd Qu.: 72.00
Max. :104.30 Max. :185.00
Local.Site.Name Daily.Obs.Count Percent.Complete AQS.Parameter.Code
Length:15976 Min. :1 Min. :100 Min. :88101
Class :character 1st Qu.:1 1st Qu.:100 1st Qu.:88101
Mode :character Median :1 Median :100 Median :88101
Mean :1 Mean :100 Mean :88215
3rd Qu.:1 3rd Qu.:100 3rd Qu.:88502
Max. :1 Max. :100 Max. :88502
AQS.Parameter.Description Method.Code Method.Description CBSA.Code
Length:15976 Min. :117 Length:15976 Min. :12540
Class :character 1st Qu.:120 Class :character 1st Qu.:23420
Mode :character Median :120 Mode :character Median :40140
Mean :297 Mean :33270
3rd Qu.:707 3rd Qu.:41740
Max. :810 Max. :49700
NA's :929
CBSA.Name State.FIPS.Code State County.FIPS.Code
Length:15976 Min. :6 Length:15976 Min. : 1.00
Class :character 1st Qu.:6 Class :character 1st Qu.: 29.00
Mode :character Median :6 Mode :character Median : 59.00
Mean :6 Mean : 54.78
3rd Qu.:6 3rd Qu.: 73.00
Max. :6 Max. :113.00
County Site.Latitude Site.Longitude
Length:15976 Min. :32.63 Min. :-124.2
Class :character 1st Qu.:34.07 1st Qu.:-121.4
Mode :character Median :35.36 Median :-119.1
Mean :36.00 Mean :-119.4
3rd Qu.:37.77 3rd Qu.:-117.9
Max. :41.71 Max. :-115.5
hist(df_2002$Daily.Mean.PM2.5.Concentration)
Check the 2022 PM2.5 data
dim(df_2022)
[1] 59918 22
head(df_2022)
Date Source Site.ID POC Daily.Mean.PM2.5.Concentration Units
1 01/01/2022 AQS 60010007 3 12.7 ug/m3 LC
2 01/02/2022 AQS 60010007 3 13.9 ug/m3 LC
3 01/03/2022 AQS 60010007 3 7.1 ug/m3 LC
4 01/04/2022 AQS 60010007 3 3.7 ug/m3 LC
5 01/05/2022 AQS 60010007 3 4.2 ug/m3 LC
6 01/06/2022 AQS 60010007 3 3.8 ug/m3 LC
Daily.AQI.Value Local.Site.Name Daily.Obs.Count Percent.Complete
1 58 Livermore 1 100
2 60 Livermore 1 100
3 39 Livermore 1 100
4 21 Livermore 1 100
5 23 Livermore 1 100
6 21 Livermore 1 100
AQS.Parameter.Code AQS.Parameter.Description Method.Code
1 88101 PM2.5 - Local Conditions 170
2 88101 PM2.5 - Local Conditions 170
3 88101 PM2.5 - Local Conditions 170
4 88101 PM2.5 - Local Conditions 170
5 88101 PM2.5 - Local Conditions 170
6 88101 PM2.5 - Local Conditions 170
Method.Description CBSA.Code
1 Met One BAM-1020 Mass Monitor w/VSCC 41860
2 Met One BAM-1020 Mass Monitor w/VSCC 41860
3 Met One BAM-1020 Mass Monitor w/VSCC 41860
4 Met One BAM-1020 Mass Monitor w/VSCC 41860
5 Met One BAM-1020 Mass Monitor w/VSCC 41860
6 Met One BAM-1020 Mass Monitor w/VSCC 41860
CBSA.Name State.FIPS.Code State County.FIPS.Code
1 San Francisco-Oakland-Hayward, CA 6 California 1
2 San Francisco-Oakland-Hayward, CA 6 California 1
3 San Francisco-Oakland-Hayward, CA 6 California 1
4 San Francisco-Oakland-Hayward, CA 6 California 1
5 San Francisco-Oakland-Hayward, CA 6 California 1
6 San Francisco-Oakland-Hayward, CA 6 California 1
County Site.Latitude Site.Longitude
1 Alameda 37.68753 -121.7842
2 Alameda 37.68753 -121.7842
3 Alameda 37.68753 -121.7842
4 Alameda 37.68753 -121.7842
5 Alameda 37.68753 -121.7842
6 Alameda 37.68753 -121.7842
tail(df_2022)
Date Source Site.ID POC Daily.Mean.PM2.5.Concentration Units
59913 12/01/2022 AQS 61131003 1 3.4 ug/m3 LC
59914 12/07/2022 AQS 61131003 1 3.8 ug/m3 LC
59915 12/13/2022 AQS 61131003 1 6.0 ug/m3 LC
59916 12/19/2022 AQS 61131003 1 34.8 ug/m3 LC
59917 12/25/2022 AQS 61131003 1 23.2 ug/m3 LC
59918 12/31/2022 AQS 61131003 1 1.0 ug/m3 LC
Daily.AQI.Value Local.Site.Name Daily.Obs.Count Percent.Complete
59913 19 Woodland-Gibson Road 1 100
59914 21 Woodland-Gibson Road 1 100
59915 33 Woodland-Gibson Road 1 100
59916 99 Woodland-Gibson Road 1 100
59917 77 Woodland-Gibson Road 1 100
59918 6 Woodland-Gibson Road 1 100
AQS.Parameter.Code AQS.Parameter.Description Method.Code
59913 88101 PM2.5 - Local Conditions 145
59914 88101 PM2.5 - Local Conditions 145
59915 88101 PM2.5 - Local Conditions 145
59916 88101 PM2.5 - Local Conditions 145
59917 88101 PM2.5 - Local Conditions 145
59918 88101 PM2.5 - Local Conditions 145
Method.Description CBSA.Code
59913 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
59914 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
59915 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
59916 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
59917 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
59918 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
CBSA.Name State.FIPS.Code State
59913 Sacramento--Roseville--Arden-Arcade, CA 6 California
59914 Sacramento--Roseville--Arden-Arcade, CA 6 California
59915 Sacramento--Roseville--Arden-Arcade, CA 6 California
59916 Sacramento--Roseville--Arden-Arcade, CA 6 California
59917 Sacramento--Roseville--Arden-Arcade, CA 6 California
59918 Sacramento--Roseville--Arden-Arcade, CA 6 California
County.FIPS.Code County Site.Latitude Site.Longitude
59913 113 Yolo 38.66121 -121.7327
59914 113 Yolo 38.66121 -121.7327
59915 113 Yolo 38.66121 -121.7327
59916 113 Yolo 38.66121 -121.7327
59917 113 Yolo 38.66121 -121.7327
59918 113 Yolo 38.66121 -121.7327
summary(df_2022)
Date Source Site.ID POC
Length:59918 Length:59918 Min. :60010007 Min. : 1.000
Class :character Class :character 1st Qu.:60290019 1st Qu.: 1.000
Mode :character Mode :character Median :60631006 Median : 3.000
Mean :60564225 Mean : 3.768
3rd Qu.:60731026 3rd Qu.: 3.000
Max. :61131003 Max. :24.000
Daily.Mean.PM2.5.Concentration Units Daily.AQI.Value
Min. : -6.700 Length:59918 Min. : 0.00
1st Qu.: 4.100 Class :character 1st Qu.: 23.00
Median : 6.800 Mode :character Median : 38.00
Mean : 8.414 Mean : 39.22
3rd Qu.: 10.700 3rd Qu.: 54.00
Max. :302.500 Max. :454.00
Local.Site.Name Daily.Obs.Count Percent.Complete AQS.Parameter.Code
Length:59918 Min. :1 Min. :100 Min. :88101
Class :character 1st Qu.:1 1st Qu.:100 1st Qu.:88101
Mode :character Median :1 Median :100 Median :88101
Mean :1 Mean :100 Mean :88192
3rd Qu.:1 3rd Qu.:100 3rd Qu.:88101
Max. :1 Max. :100 Max. :88502
AQS.Parameter.Description Method.Code Method.Description CBSA.Code
Length:59918 Min. :143 Length:59918 Min. :12540
Class :character 1st Qu.:170 Class :character 1st Qu.:31080
Mode :character Median :170 Mode :character Median :40140
Mean :338 Mean :34971
3rd Qu.:707 3rd Qu.:41860
Max. :810 Max. :49700
NA's :4569
CBSA.Name State.FIPS.Code State County.FIPS.Code
Length:59918 Min. :6 Length:59918 Min. : 1.00
Class :character 1st Qu.:6 Class :character 1st Qu.: 29.00
Mode :character Median :6 Mode :character Median : 63.00
Mean :6 Mean : 56.28
3rd Qu.:6 3rd Qu.: 73.00
Max. :6 Max. :113.00
County Site.Latitude Site.Longitude
Length:59918 Min. :32.58 Min. :-124.2
Class :character 1st Qu.:34.07 1st Qu.:-121.4
Mode :character Median :36.49 Median :-119.6
Mean :36.25 Mean :-119.6
3rd Qu.:37.96 3rd Qu.:-117.9
Max. :41.76 Max. :-115.5
hist(df_2022$Daily.Mean.PM2.5.Concentration)
library(ggplot2)library(patchwork)ymax <-max( df_2002$Daily.Mean.PM2.5.Concentration, df_2022$Daily.Mean.PM2.5.Concentration)p1 <- df_2002 %>%ggplot(aes(x =1, y = Daily.Mean.PM2.5.Concentration)) +geom_violin(fill ="red", alpha =0.5) +geom_boxplot(width =0.1, fill ="white", outlier.shape =NA) +labs(title ="PM2.5 Distribution in 2002", y ="Daily Mean PM2.5 Concentration", x =NULL) +coord_cartesian(ylim =c(0, ymax)) +theme_bw()p2 <- df_2022 %>%ggplot(aes(x =1, y = Daily.Mean.PM2.5.Concentration)) +geom_violin(fill ="blue", alpha =0.5) +geom_boxplot(width =0.1, fill ="white", outlier.shape =NA) +labs(title ="PM2.5 Distribution in 2022", y ="Daily Mean PM2.5 Concentration", x =NULL) +coord_cartesian(ylim =c(0, ymax)) +theme_bw()p1 + p2
Summary: The 2002 dataset contains 15,976 records and 22 variables, including 9 character variables and 13 numeric variables. Some values for Core-Based Statistical Areas (CBSA) are missing. The histogram of PM2.5 shows a strongly right-skewed distribution.
The 2022 dataset contains 59,918 records and 22 variables. Similar to 2002, there are missing values in the CBSA code. Interestingly, the histogram of PM2.5 also shows a strongly right-skewed distribution, but it additionally includes negative PM2.5 values. Upon reviewing the data source, this may be due to instrumental factors. It is also worth noting that there are some outliers in the 2022 PM2.5 concentration data.
(10 points) Combine the two years of data into one data frame. Use the Date variable to create a new column for year, which will serve as an identifier. Change the names of the key variables so that they are easier to refer to in your code.
I changed the name of Daily.Mean.PM2.5.Concentration to PM2.5, Site.Longitude to Lon, and Site.Latitude to Lat.
Question 3
(20 points) Create a basic map (or maps) in leaflet showing the locations of the monitoring sites, using different colors for each year. Summarize the spatial distribution of the sites. Does this distribution change from 2002 to 2022?
library(leaflet)library(htmltools)map_2002 <- df %>%filter(Year =="2002") %>%leaflet() %>%addProviderTiles('OpenStreetMap') %>%addCircles(lat=~Lat,lng=~Lon, opacity=1, fillOpacity=1, radius=100, color="red") %>%addControl("<b>Map of Monitoring Sites in 2002</b>", position ="topright")map_2022 <- df %>%filter(Year =="2022") %>%leaflet() %>%addProviderTiles('OpenStreetMap') %>%addCircles(lat=~Lat,lng=~Lon, opacity=1, fillOpacity=1, radius=100, color="blue") %>%addControl("<b>Map of Monitoring Sites in 2022</b>", position ="topright")leaflet_grid <-tagList( tags$table(width ="100%", tags$tr( tags$td(map_2002), tags$td(map_2022) ) ) )leaflet_grid
pal <-colorFactor(palette =c("red", "blue"), domain = df$Year)leaflet() %>%addTiles() %>%addCircleMarkers(data = df %>%filter(Year ==2002),~Lon, ~Lat, color ="red", group ="2002", popup =~County,opacity =0.5, fillOpacity =0.5) %>%addCircleMarkers(data = df %>%filter(Year ==2022),~Lon, ~Lat, color ="blue", group ="2022", popup =~County,opacity =0.5, fillOpacity =0.5) %>%addLegend("bottomright", pal = pal, values = df$Year, title ="Year") %>%addLayersControl(overlayGroups =c("2002", "2022"))
df %>%distinct(Year, Site.ID) %>%count(Year, name ="n_sites")
Year n_sites
1 2002 103
2 2022 167
Summary: From the maps of monitoring station distribution, it was difficult to make a clear comparison between the two years. To address this, I overlaid the maps of 2002 and 2022, which allowed for a more direct comparison. The results show that there are more sites in 2022, with most concentrated in the southern part of California. A comparison of the number of monitoring sites also confirms this observation.
Question 4
(10 points) Check for any data issues such as missing or implausible values of PM in the combined dataset. Calculate the proportion of missing/implausible values for each year and report any temporal patterns you see in these observations.
# A tibble: 2 × 14
Year n_records n_missing q1 median q3 mean min max iqr
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2002 15976 0 7 12 20.5 16.1 0 104. 13.5
2 2022 59918 0 4.1 6.8 10.7 8.41 -6.7 302. 6.6
# ℹ 4 more variables: lower_bound <dbl>, upper_bound <dbl>, n_outliers <int>,
# pct_outlier <dbl>
Summary: In the dataset, there were no missing values for PM2.5. For implausible values, I identified outliers using Tukey’s boxplot rule. The proportion of outliers was 6.8% in 2002 and 4.6% in 2022. It is also noteworthy that negative PM2.5 values appeared in 2022, which are likely due to instrument error or data recording issues.
Regarding temporal patterns, both the mean and median PM2.5 values decreased from 2002 to 2022, indicating an overall improvement in air quality over the past two decades. However, when classifying PM2.5 concentrations according to the U.S. Air Quality Index, 2022 shows a higher proportion of days categorized as unhealthy. This suggests that while average conditions have improved, extreme pollution events leading to very high PM2.5 concentrations may have become more frequent.
Reference: https://www.airnow.gov/aqi/aqi-basics/
Question 5
(30 points) Explore the main question of interest at three different levels of spatial resolution. Create data visualizations (e.g. boxplots, histograms, line plots, violin plots) and summary statistics that best suit each level of the analysis. Be sure to write up explanations of what you observe at each level.
Level 1: State
Examine the primary question for the entire state.
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.
p1 / p2
Summary: I have already presented the violin plot in Question 1 and provided summary statistics at the state level in Question 4. In this question, I used histograms to compare the daily PM2.5 concentrations between 2002 and 2022. The solid line represents the mean value and dash line is median.
Overall, at the state level, the mean daily PM2.5 concentration decreased in 2022 compared to 2002. When focusing on the distribution within the 0–100 µg/m³ range, the histograms further show that a larger proportion of days in 2022 fall into lower PM2.5 concentration levels, indicating an overall improvement in air quality.
Level 2: County
Examine the primary question for every county in California.
pd <-position_dodge2(width =0.5, reverse =TRUE, preserve ="single")df %>%ggplot(aes(x = County, y = PM2.5, color =factor(Year, levels =c("2002", "2022")), group = Year)) +stat_summary(fun.data = mean_sdl, geom ="errorbar", position = pd,width =0.3, na.rm =TRUE) +stat_summary(fun.data = mean_sdl, geom ="point", position = pd, size =2,na.rm =TRUE ) +labs(title ="Median Daily PM2.5 Concentration by County (2002 vs 2022)",x ="County", y ="PM2.5", color ="Year" ) +scale_color_manual(values =c("2002"="red", "2022"="blue"),breaks =c("2002", "2022") ) +theme_bw() +coord_flip() +theme(axis.text.y =element_text(size =12))
Summary: From the summary dataset of PM2.5 by county, we can observe that both the mean and median PM2.5 levels decreased from 2002 to 2022, indicating an improvement in air quality. From the plot, however, a few counties show an increase in median PM2.5, including Trinity, Siskiyou, Nevada, Mono, Mendocino, and Del Norte. In addition, some counties only have data available for one of the two years. Overall, the findings suggest a general improvement in statewide air quality, while highlighting localized areas where PM2.5 levels have not improved or have even worsened.
Level 3: City
Restrict the data to sites in Los Angeles county and examine the primary question for every site.
# A tibble: 17 × 7
Local.Site.Name `2002_mean` `2022_mean` `2002_median` `2022_median` diff_mean
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Azusa 20.8 9.72 18.7 9.65 -11.0
2 Burbank 24.0 NA 21.6 NA NA
3 Lancaster-Divi… 10.4 7.52 10 7.3 -2.86
4 Lebec 4.82 3.50 4.8 3.4 -1.32
5 Long Beach (No… 19.5 9.92 16.8 9.6 -9.55
6 Los Angeles-No… 22.0 11.6 19.3 10.9 -10.4
7 Lynwood 23.3 NA 19.8 NA NA
8 Pasadena 20.3 9.09 17.8 7.9 -11.2
9 Reseda 18.9 10.7 17.0 10.3 -8.14
10 Compton NA 13.0 NA 11.9 NA
11 Glendora NA 8.43 NA 7.8 NA
12 Long Beach (So… NA 12.0 NA 11.3 NA
13 Long Beach-Rou… NA 13.0 NA 12.1 NA
14 North Hollywoo… NA 13.0 NA 13.1 NA
15 Pico Rivera #2 NA 11.4 NA 10.0 NA
16 Santa Clarita NA 9.14 NA 8.5 NA
17 Signal Hill (L… NA 8.86 NA 8.5 NA
# ℹ 1 more variable: diff_median <dbl>
df %>%filter(County =="Los Angeles"&!is.na(Local.Site.Name) & Local.Site.Name !="") %>%group_by(Local.Site.Name, Year) %>%summarise(median_pm25 =median(PM2.5, na.rm =TRUE), .groups ="drop") %>%ggplot(aes(x = Local.Site.Name, y = median_pm25,color =factor(Year), group = Local.Site.Name)) +geom_point(position =position_dodge(width =0.5), size =3) +geom_line(aes(group = Local.Site.Name), color ="grey60") +labs(x ="Local Site Name", y ="Median PM2.5", color ="Year") +scale_color_manual(values =c("2002"="red", "2022"="blue") ) +theme_bw() +theme(axis.text.x =element_text(angle =45, hjust =1))
Summary: From the summary dataset of PM2.5 by local sites in Los Angeles, we can see that some site names are missing, and many sites only have data for one of the two years. This suggests that local monitoring stations may have changed locations over time. For sites with data available in both years, both the mean and median PM2.5 values generally show a decrease, indicating improved air quality. This trend can also be observed in the plot of daily median values. However, due to the limited number of sites with complete data, the evidence may not be sufficient to fully support the conclusion that air quality consistently improved across Los Angeles between 2002 and 2022.